tactile signal
Touch and Go: Learning from Human-Collected Vision and Touch
The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world. We propose a dataset with paired visual and tactile data called Touch and Go, in which human data collectors probe objects in natural environments using tactile sensors, while simultaneously recording egocentric video. In contrast to previous efforts, which have largely been confined to lab settings or simulated environments, our dataset spans a large number of "in the wild" objects and scenes. We successfully apply our dataset to a variety of multimodal learning tasks: 1) self-supervised visuo-tactile feature learning, 2) tactile-driven image stylization, i.e., making the visual appearance of an object more consistent with a given tactile signal, and 3) predicting future frames of a tactile signal from visuo-tactile inputs.
OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer
Yin, Jessica, Qi, Haozhi, Wi, Youngsun, Kundu, Sayantan, Lambeta, Mike, Yang, William, Wang, Changhao, Wu, Tingfan, Malik, Jitendra, Hellebrekers, Tess
Abstract-- Human video demonstrations provide abundant training data for learning robot policies, but video alone cannot capture the rich contact signals critical for mastering manipulation. We introduce OSMO, an open-source wearable tactile glove designed for human-to-robot skill transfer . The glove features 12 three-axis tactile sensors across the fingertips and palm and is designed to be compatible with state-of-the-art hand-tracking methods for in-the-wild data collection. We demonstrate that a robot policy trained exclusively on human demonstrations collected with OSMO, without any real robot data, is capable of executing a challenging contact-rich manipulation task. On a real-world wiping task requiring sustained contact pressure, our tactile-aware policy achieves a 72% success rate, outperforming vision-only baselines by eliminating contact-related failure modes. We release complete hardware designs, firmware, and assembly instructions to support community adoption. Tactile sensing enables humans to excel at manipulation by providing real-time feedback about contact forces that vision alone cannot capture. Consider trying to dice a carrot from video alone; one cannot observe the nuanced force control that makes the task successful. Many different applied forces can result in nearly identical visual appearances, leaving critical information about force control invisible to vision.
- North America > United States > Pennsylvania (0.04)
- North America > United States > Michigan (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
A Humanoid Visual-Tactile-Action Dataset for Contact-Rich Manipulation
Kwon, Eunju, Oh, Seungwon, Baek, In-Chang, Park, Yucheon, Kim, Gyungbo, Moon, JaeYoung, Choi, Yunho, Kim, Kyung-Joong
Abstract--Contact-rich manipulation has become increasingly important in robot learning. However, previous studies on robot learning datasets have focused on rigid objects and underrepre-sented the diversity of pressure conditions for real-world manipulation. T o address this gap, we present a humanoid visual-tactile-action dataset designed for manipulating deformable soft objects. The dataset was collected via teleoperation using a humanoid robot equipped with dexterous hands, capturing multi-modal interactions under varying pressure conditions. Contact-rich interaction represents a critical gateway for enabling robots to perform complex tasks in real-world environments, yet it remains one of the fundamental challenges in robotic manipulation [1].
- North America > United States > Michigan (0.05)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States > Virginia (0.04)
- Information Technology (0.68)
- Law (0.67)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Cross-Sensor Touch Generation
Rodriguez, Samanta, Dou, Yiming, Oller, Miquel, Owens, Andrew, Fazeli, Nima
Today's visuo-tactile sensors come in many shapes and sizes, making it challenging to develop general-purpose tactile representations. This is because most models are tied to a specific sensor design. To address this challenge, we propose two approaches to cross-sensor image generation. The first is an end-to-end method that leverages paired data (Touch2Touch). The second method builds an intermediate depth representation and does not require paired data (T2D2: Touch-to-Depth-to-Touch). Both methods enable the use of sensor-specific models across multiple sensors via the cross-sensor touch generation process. Together, these models offer flexible solutions for sensor translation, depending on data availability and application needs. We demonstrate their effectiveness on downstream tasks such as in-hand pose estimation and behavior cloning, successfully transferring models trained on one sensor to another. Project page: https://samantabelen.github.io/cross_sensor_touch_generation.
- North America > United States > Michigan (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation
Liu, Zhuoyang, Liu, Jiaming, Xu, Jiadong, Han, Nuowei, Gu, Chenyang, Chen, Hao, Zhou, Kaichen, Zhang, Renrui, Hsieh, Kai Chin, Wu, Kun, Che, Zhengping, Tang, Jian, Zhang, Shanghang
Vision-language-action models (VLAs) have shown generalization capabilities in robotic manipulation tasks by inheriting from vision-language models (VLMs) and learning action generation. Most VLA models focus on interpreting vision and language to generate actions, whereas robots must perceive and interact within the spatial-physical world. This gap highlights the need for a comprehensive understanding of robotic-specific multisensory information, which is crucial for achieving complex and contact-rich control. To this end, we introduce a multisensory language-action (MLA) model that collaboratively perceives heterogeneous sensory modalities and predicts future multisensory objectives to facilitate physical world modeling. Specifically, to enhance perceptual representations, we propose an encoder-free multimodal alignment scheme that innovatively repurposes the large language model itself as a perception module, directly interpreting multimodal cues by aligning 2D images, 3D point clouds, and tactile tokens through positional correspondence. To further enhance MLA's understanding of physical dynamics, we design a future multisensory generation post-training strategy that enables MLA to reason about semantic, geometric, and interaction information, providing more robust conditions for action generation. For evaluation, the MLA model outperforms the previous state-of-the-art 2D and 3D VLA methods by 12% and 24% in complex, contact-rich real-world tasks, respectively, while also demonstrating improved generalization to unseen configurations. Project website: https://sites.google.com/view/open-mla
exUMI: Extensible Robot Teaching System with Action-aware Task-agnostic Tactile Representation
Xu, Yue, Wei, Litao, An, Pengyu, Zhang, Qingyu, Li, Yong-Lu
Tactile-aware robot learning faces critical challenges in data collection and representation due to data scarcity and sparsity, and the absence of force feedback in existing systems. To address these limitations, we introduce a tactile robot learning system with both hardware and algorithm innovations. We present exUMI, an extensible data collection device that enhances the vanilla UMI with robust proprioception (via AR MoCap and rotary encoder), modular visuo-tactile sensing, and automated calibration, achieving 100% data usability. Building on an efficient collection of over 1 M tactile frames, we propose Tactile Prediction Pretraining (TPP), a representation learning framework through action-aware temporal tactile prediction, capturing contact dynamics and mitigating tactile sparsity. Real-world experiments show that TPP outperforms traditional tactile imitation learning. Our work bridges the gap between human tactile intuition and robot learning through co-designed hardware and algorithms, offering open-source resources to advance contact-rich manipulation research. Project page: https://silicx.github.io/exUMI.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Pennsylvania (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Education (0.67)
- Information Technology (0.46)
Design of scalable orthogonal digital encoding architecture for large-area flexible tactile sensing in robotics
Liu, Weijie, Qiu, Ziyi, Wang, Shihang, Mei, Deqing, Wang, Yancheng
Human-like embodied tactile perception is crucial for the next-generation intelligent robotics. Achieving large-area, full-body soft coverage with high sensitivity and rapid response, akin to human skin, remains a formidable challenge due to critical bottlenecks in encoding efficiency and wiring complexity in existing flexible tactile sensors, thus significantly hinder the scalability and real-time performance required for human skin-level tactile perception. Herein, we present a new architecture employing code division multiple access-inspired orthogonal digital encoding to overcome these challenges. Our decentralized encoding strategy transforms conventional serial signal transmission by enabling parallel superposition of energy-orthogonal base codes from distributed sensing nodes, drastically reducing wiring requirements and increasing data throughput. We implemented and validated this strategy with off-the-shelf 16-node sensing array to reconstruct the pressure distribution, achieving a temporal resolution of 12.8 ms using only a single transmission wire. Crucially, the architecture can maintain sub-20ms latency across orders-of-magnitude variations in node number (to thousands of nodes). By fundamentally redefining signal encoding paradigms in soft electronics, this work opens new frontiers in developing scalable embodied intelligent systems with human-like sensory capabilities.
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Jiangsu Province > Yancheng (0.04)
- North America > United States (0.04)
- Asia > Japan (0.04)
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation
Li, Yuyang, Du, Wenxin, Yu, Chang, Li, Puhao, Zhao, Zihang, Liu, Tengyu, Jiang, Chenfanfu, Zhu, Yixin, Huang, Siyuan
Tactile sensing is crucial for achieving human-level robotic capabilities in manipulation tasks. As a promising solution, Vision-Based Tactile Sensors (VBTSs) offer high spatial resolution and cost-effectiveness, but present unique challenges in robotics for their complex physical characteristics and visual signal processing requirements. The lack of efficient and accurate simulation tools for VBTSs has significantly limited the scale and scope of tactile robotics research. We present Taccel, a high-performance simulation platform that integrates IPC and ABD to model robots, tactile sensors, and objects with both accuracy and unprecedented speed, achieving an 18-fold acceleration over real-time across thousands of parallel environments. Unlike previous simulators that operate at sub-real-time speeds with limited parallelization, Taccel provides precise physics simulation and realistic tactile signals while supporting flexible robot-sensor configurations through user-friendly APIs. Through extensive validation in object recognition, robotic grasping, and articulated object manipulation, we demonstrate precise simulation and successful sim-to-real transfer. These capabilities position Taccel as a powerful tool for scaling up tactile robotics research and development, potentially transforming how robots interact with and understand their physical environment.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
TacCompress: A Benchmark for Multi-Point Tactile Data Compression in Dexterous Hand
Zhao, Yan, Li, Yang, Cheng, Zhengxue, Zhang, Hengdi, Song, Li
Though robotic dexterous manipulation has progressed substantially recently, challenges like in-hand occlusion still necessitate fine-grained tactile perception, leading to the integration of more tactile sensors into robotic hands. Consequently, the increased data volume imposes substantial bandwidth pressure on signal transmission from the hand's controller. However, the acquisition and compression of multi-point tactile signals based on the dexterous hands' physical structures have not been thoroughly explored. In this paper, our contributions are twofold. First, we introduce a Multi-Point Tactile Dataset for Dexterous Hand Grasping (Dex-MPTD). This dataset captures tactile signals from multiple contact sensors across various objects and grasping poses, offering a comprehensive benchmark for advancing dexterous robotic manipulation research. Second, we investigate both lossless and lossy compression on Dex-MPTD by converting tactile data into images and applying six lossless and five lossy image codecs for efficient compression. Experimental results demonstrate that tactile data can be losslessly compressed to as low as 0.0364 bits per sub-sample (bpss), achieving approximately 200$\times$ compression ratio compared to the raw tactile data. Efficient lossy compressors like HM and VTM can achieve about 1000$\times$ data reductions while preserving acceptable data fidelity. The exploration of lossy compression also reveals that screen-content-targeted coding tools outperform general-purpose codecs in compressing tactile data.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.06)
- Asia > China > Shanghai > Shanghai (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > New York > New York County > New York City (0.04)